home *** CD-ROM | disk | FTP | other *** search
-
- Amiga TAS Cycles
-
- A Study of the Viability of the TAS Instruction on the Zorro II
- Bus
-
- by Dave Haynie
-
- There has been a considerable amount of confusion about the use of the
- TAS, or Test And Set, instruction on Amiga systems for quite some
- time. Historically, the 68000 implementation of the bus lock required
- by TAS has been a headache for many 68000 system designers, and Amiga
- engineers haven't been excluded from this. While the problem of TAS
- alone is an important issue, the evolution of official TAS support on
- Amiga systems further complicates the matter. This article won't give
- a clear answer to the "can I use TAS" question, since that answer
- changes based on the context of TAS's use. What I'm trying to do here
- is explain why TAS is a problem, what various Amiga systems do with
- TAS, and when TAS might be safely used by a Zorro II expansion
- board.
-
- The Need For TAS?
-
- The TAS instruction is designed to support hardware locked semaphores
- between multiple processors in a system. The idea is that making a
- read cycle, ALU operation, and write cycle an atomic operation will
- allow the 68000 to check on the value of a semaphore in memory and set
- this semaphore if currently unset, all without the possibility of
- another processor gaining access to the memory in between the read and
- write phases. Presumably, as long as shared structures in a
- multiprocessing system are all guarded by semaphores, and all
- semaphores are obtained under bus lock protection, then the
- possibility of uncontrolled simultaneous access of these shared
- resources is eliminated.
-
- In the general case, this is called bus locking, and it's a
- perfectly valid concept. The problem stems from the way the 68000
- designers chose to lock the bus. This TAS-generated bus lock becomes
- a special case 68000 cycle that's often not correctly supported by
- hardware. For this reason, TAS should never be used on Zorro II
- devices that aren't specifically designed to support it. As long as
- you're not dealing with multiprocessor systems, TAS isn't necessary.
- The remainder of the document will be concerned with multiprocessing
- systems; the cases in which TAS is very useful, if not actually
- necessary.
-
- It should be noted that the TAS instruction is neither necessary nor
- supported on the Amiga's Chip memory bus. Although the Amiga's
- blitter and Copper do serve to provide a form of multiprocessing on
- the Chip bus, this is a specialized multiprocessing where the blitter
- and Copper basically act as slaves to the main 680x0 processor. The
- TAS instruction is used to support structure locking between peer CPUs
- which both share the same Amiga OS structures. Any alternate
- processor that acts as a slave, such as the Amiga chips or perhaps a
- DMA device, will not share system structures and therefore, do not
- need the hardware bus locking provided by TAS. Also, the design of
- the Chip bus makes it impossible to support any use of the 68000 TAS
- instruction to Chip memory anyway. The custom chips get bus access
- for two out of every four system clock cycles, holding the 68000 off
- when they're on the bus. The double TAS cycle conflicts with this
- mechanism, yielding unpredictable results.
-
-
-
- A 68000 Bus Locking Primer
-
- As mentioned, the problem with the use of TAS is primarily
- based on the way the 68000 implements its bus locking. Unlike 68020
- and 68030 processors, which drive a special Read-Modify Cycle signal
- (RMC*) to indicate bus locking, the 68000 runs a modification of the
- normal 68000 bus cycle.
-
-
-
- ******************* FIGURE 1 ******************
-
-
-
- The normal 68000 bus cycle starts with the assertion of addresses and
- the read strobe (R/W), followed shortly by the assertion of address
- strobe (AS*) and the data strobes (UDS* and LDS*). AS* is asserted
- during state S2, while the data strobes are asserted during state S2
- for read cycles, state S4 for write cycles. The 68000 drives data for
- write cycles just before the data strobes, and samples data on read
- cycles on the falling edge of its clock at state S6. In either case,
- it samples the data transfer acknowledge strobe (DTACK*) on the
- falling edge of state S4, and inserts wait states until DTACK* is
- asserted. Once DTACK* is sampled, the cycle concludes with AS*, UDS*,
- LDS*, and most other signals being negated during S7 or shortly
- thereafter. Figure 1 illustrates a standard 68000 read and write
- cycle.
-
- The above read cycle followed by write cycle is quite sufficient for
- handling semaphores in a single processor system, if supported within
- a single instruction. The 68000 family CPUs don't permit interrupts
- to be serviced except at instruction boundaries. So the 68000 bit
- test instructions are adequate to support software semaphores in an
- interrupt driven multitasking single processor system. Bus
- arbitration, on the other hand, is managed at cycle boundaries for the
- most part. Ordinary I/O DMA devices such as hard disks don't
- interfere with this locking either, since such DMA devices don't
- access system structures. All 680x0 locked cycles, including TAS,
- complete atomically even in the presence of a pending DMA request, so
- true bus locking isn't strictly necessary to handle semaphores between
- DMA processors using normal 680x0 bus arbitration to share memory. A
- bus locking mechanism is, however, the proper way to handle resource
- locking of structures in a cycle-shared mailbox memory, such as the
- shared memory that's implemented on Amiga Bridgecards.
-
-
-
- ******************* FIGURE 2 ******************
-
-
-
- The 68000 TAS cycle works much like a normal read followed by a normal
- write, except that AS* is not negated between the read and write
- portions of the compound cycle. Bus devices that expect AS* to
- qualify cycle boundaries, or devices that anticipate a minimum delay
- between the assertion of DTACK* and the negation of AS*, will fail
- with TAS cycles. Both AS* and the data strobes must be used to
- identify the subcycle boundaries within the full TAS cycle. The bus
- address doesn't change throughout the cycle, while the R/W line and
- data bus obey the timing of the individual subcycles. DTACK* and
- other slave device signals will similarly respect the timing of the
- individual subcycles, as based on UDS* or LDS*. Additionally, the TAS
- cycle contains extra clock cycles for the ALU operation, so that
- instead of the minimal eight clocks for a read cycle immediately
- followed by a write cycle, the minimum TAS cycle takes ten clocks.
- Figure 2 illustrates a 68000 TAS cycle.
-
-
-
- Zorro II Bus Issues
-
- The Zorro II expansion bus, being originally based on the 68000 CPU
- bus, by extension can run TAS cycles. However, their use on the Zorro
- II bus is far from straightforward. Although the original Zorro II
- specifications and the updated specifications published in the
- A500/A2000 Technical Reference Manual both require the M68000
- Users Manual as a companion specification, the use of TAS was never
- explicitly discussed for use on the bus. Additionally, the use of TAS
- in Amiga Chip memory was explicitly forbidden, and at the time of the
- Zorro II specification writing, generally considered unsupported
- anywhere on Amiga computers.
-
- Because of this ambiguity, many Zorro II devices don't support TAS
- cycles at all. Therefore, the use of TAS is explicitly forbidden on
- any Zorro II device that isn't specifically known to support TAS.
- Even simple memory boards probably act unpredictably when driven with
- a TAS cycle.
-
- Circuits that do support TAS must be very carefully designed
- to support it properly. Due to clock skewing between the 68000 and
- the backplane, or alternate bus masters and the backplane, no Zorro II
- slave PIC that controls DTACK* via XRDY must assume it knows the delay
- between its negation of XRDY and the end of the cycle. For any
- XRDY-controlled Zorro II cycle, the cycle's end is defined by the
- negation of AS* or UDS*/LDS*, which ever comes first. This is a
- general rule (nothing specific to the TAS cycle), but one would expect
- the situations in which TAS is necessary, such as locking shared
- memory resources, to be more sensitive to any cycle delays. All
- shared memory circuits must have some kind of memory arbiter, and such
- arbiters might be easier to design considering only outgoing signals
- such as XRDY. However, designs that don't use AS* to end normal
- cycles and UDS*/LDS* to end the read half of the TAS cycle are very
- likely to cause trouble.
-
-
-
- ******************* FIGURE 3 ******************
-
-
-
- Figure 3 illustrates how a reasonably designed XRDY based control
- circuit can possibly result in an unexpected wait state. This example
- is assuming A2000 system timing. Essentially, what this hypothetical
- device wants to do is terminate the current cycle by negating the XRDY
- line that it pulled properly at the start of the cycle. The XRDY
- signal is being driven on the rising edge of the 7MHz clock, the
- result of a synchronous state machine that runs with 7MHz timing.
-
- The A2000's Zorro II bus uses a buffered version of the 68000's
- 7MHz clock. This can introduce clock delays of roughly 10ns between
- the actual 68000 clock and the bus clock. The hypothetical circuit
- uses the rising edge of the clock to enable XRDY. I'm assuming a
- total worst-case delay of 35ns here, which must take into account the
- PIC logic in the XRDY path: clock buffer, flip-flop and state machine
- logic, and an open collector output buffer to drive XRDY onto the bus.
- While XRDY circuits may be faster in some designs, this example is not
- unreasonable. Once XRDY is asserted, there's the Gary chip delay to
- consider, which can be slower than 25ns in worst case. Thus, in this
- example, DTACK* isn't asserted until over 70ns after the real 7MHz
- clock transition, even ignoring other small delays that can be
- incurred by bus loading or assymetry of the 7MHz clock. In order to
- assure that DTACK* is sampled by the 68000 without an additional wait
- state, it must be asserted by the 68000 at least 10ns prior to the
- falling edge of the 7MHz clock.
-
- Clearly, we're missing that in this example by at least 10ns, so the
- wait state is added. Any logic on this hypothetical card that assumed
- the cycle would be over two clocks after XRDY was clocked will fail.
- Likewise, logic on that card that assumes the extra wait state will
- always be added would be just as fatal. This example assumes worst
- case, but in a best case situation, the wait state would never happen.
-
- Also, the characteristics of the Zorro II bus interface come into
- play. An A2000 may take 25ns to recognize XRDY, but another backplane
- may treat it faster or slower. The XRDY signal has always been
- treated as an asynchronous signal, and there is no guaranteed range of
- synchronous operation defined for this signal. The AS* or UDS*/LDS*
- defined end of cycle must always be explicitly considered by a proper
- Zorro II XRDY controlled slave device (this is also a requirement
- defined by the 68000 specifications, though they don't explain the
- likely problems of not being strobe driven) [see footnote]***.
-
- ***[footnote] I don't claim these problems should be immediately
- obvious to everyone. Once mentioned, it should be pretty easy to
- recognize that pre-supposing the end of a bus cycle is dangerous. I
- ran into this exact type of problem in the B revision of the A3000's
- Buster chip, so it can happen to anyone. The current Buster, like any
- good design, is basing its end-of-cycle actions on the negation of the
- appropriate strobes, never the assumption on when DTACK* will take
- effect.
-
- Any designs that must behave synchronously must assert OVR* to release
- the backplane's control of DTACK*, and then drive DTACK*
- synchronously. The Zorro II requirements for synchronous operation
- with DTACK* controlled termination require that DTACK* be set up to
- the falling edge of S4 by 30ns, which is a sufficient amount of time
- to meet both the actual 68000 setup time and offset the effect of any
- clock skews on the backplane. Alternate bus masters such as DMA
- driven hard disk controllers must of course take into account
- synchronous operation with DTACK* as one of the list of bus
- constraints they must meet.
-
-
-
- TAS and the A2500
-
- At the time the A2620 and A2630 were designed, the use of TAS on the
- expansion bus was still considered a dubious practice. While TAS can,
- within the limits set forth in this document, be used successfully on
- some Amiga system, it is not completely supported in either A2500
- configuration. As mentioned, the TAS cycle performs two major
- functions. The first, which is supported by the A2500, is to provide
- an atomic cycle that will arbitrate a semaphore between DMA devices on
- the bus. With A2500s, TAS is guaranteed to be atomic with respect to
- bus arbitration. However, it does not generate 68000 compatible TAS
- cycles on the expansion bus. The TAS instruction executed on either
- A2500 system will result in a standard 68000 read cycle followed by a
- standard 68000 write cycle. Thus, there is no bus locking, so TAS
- won't be sufficient to arbitrate semaphores in shared mailbox type
- memory. The best solution for such mailbox arbitration is to use
- software based spin lock semaphores or other methods which permit safe
- multiprocessing semaphores without the hardware bus locking. If
- that's unacceptable for all Amigas, the TAS solution can be invoked on
- Amiga systems that support it, while the software-only solution can be
- invoked on A2500s or other 2500-class machines. Software can
- determine the existence of a real A2500 by querying for an A2500 via
- the expansion library. A2500s show up as Commodore products
- (manufacturer number $0202) numbers $50 (for A2500/20) or $51 (for
- A2500/30).
-
- While you can explicitly check for operation on a true A2500, you
- cannot as easily determine in all cases if the machine is a similarly
- enhanced A2000 running with a third party accelerator board. Without
- autoconfiguration units such as on real A2500s, software can only tell
- the CPU type. Some third party coprocessor boards are available
- without memory or other permanent autoconfiguration units that
- software could use to identify the accelerator manufacturer. Even
- some of those boards with memory don't autoconfigure it like they
- should. Commodore does not have a list of which (if there are any)
- third party accelerator cards will fully support TAS cycles anyway.
- It is therefore important to assume that any 68020 or 68030 Amiga,
- except for the A3000, is not likely to support TAS and should instead
- handle bus locking in software.
-
-
-
- TAS and the A3000
-
- The A3000 does support 68000 compatible TAS cycles on its
- implementation of the Zorro II bus, but there are some caveats that
- must be considered. Any differences from the A3000's implementation
- of the 68000-compatible TAS cycle stem from the fact that TAS is
- handled differently by the 68030. The primary difference in 68030 TAS
- handling is that the 68030 runs two standard 68030 cycles in response
- to TAS, and locks the bus by way of the RMC* signal. Also, while the
- 68000 specified the length of the ALU cycle between the read and write
- phases of the TAS instruction, the 68030 specifications do not
- quantify this part of the cycle. When supporting the 68000 TAS cycle,
- the A3000's CPU is also running asynchronously to the Zorro II bus, so
- clock synchronization delays also creep in.
-
- The result of all of this is that the length of the entire Zorro II
- TAS cycle on the Amiga 3000 bus can vary by one, and possibly two,
- 7MHz clock cycles. The A3000 TAS cycle is shown in Figure 4. The
- A3000 starts all bus cycles as Zorro III cycles, with the assertion of
- the Full Cycle Strobe (FCS*) shortly following the 68030's assertion
- of AS*. When a Zorro II cycle is indicated, FCS* is sampled on the
- falling edge of CDAC, and then on the rising edge of the 7MHz clock.
- The diagram shows where FCS* and AS* might have been to give rise to
- the Zorro II cycle starting here. The double-clocked FCS* will
- quickly create the A3000's 68000 compatible address strobe, CCS*, and
- in the case of a read cycle, the UDS*/LDS* equivalents called DS3* and
- DS2*, respectively.
-
- The remainder of the read phase runs just as a 68000 cycle until
- DTACK* is sampled at the falling edge of S4. DTACK* would normally
- cause CCS* and the data strobes to be negated at S6, but since this is
- a TAS cycle, only the data strobes are negated. Also at this point,
- the expansion bus data is latched for the 68030 and the 68030's
- DSACK1* signal is asserted on the A3000's local bus. This is the
- first cause of the cycle length ambiguity. The 68030 is of course
- running asynchronous to the Zorro II cycle in progress, and so the
- time between the assertion of DSACK1* and the 68030's negation of AS*
- can be as quick as 40ns, for an A3000/25 that just manages to hit
- DSACK1* at the sample point. It might also be as long as 120ns for an
- A3000/16 that just misses the sample point. If the worst-case timing
- of the 68030 is considered, this last delay could actually be as long
- as 150ns. Once AS* is negated, DSACK1* will be negated and the 68030
- cycle ends, though the Zorro II within Zorro III cycle-in-progress
- remains active.
-
-
-
- ******************* FIGURE 4 ******************
-
-
-
- The next half of the TAS cycle starts with the 68030's assertion of
- AS* again. There is more cycle length ambiguity introduced here,
- since the 68030 specification does not document the number of clocks a
- TAS ALU cycle will take. Since any 68030 clock cycle is only a
- fraction of a Zorro II cycle, this isn't expected to be a drastic
- ambiguity, but it's something to consider. Since the FCS* strobe is
- already asserted, the A3000 bus controller effectively samples AS*,
- first with the falling CDAC clock and then with the rising 7MHz clock,
- to start the write section of the TAS cycle. Based on the intercycle
- ambiguities as seen from the 68030's perspective, this second bus
- cycle will sometimes start early, other times late. Figure 4 shows
- the likely best and worst case secondary cycles, which ultimately
- depend on whether or not AS* is asserted quickly enough to be clocked
- by the falling CDAC in S11, or whether it must wait until S13.
-
- However that gets settled, the bus controller drives DS3*/DS2*,
- samples DTACK*, and ultimately negates both data strobes and CCS*.
- Following this end of the Zorro II cycle, the DSACK1* strobe is
- asserted onto the local bus, the 68030 sees this within 40ns or 60ns
- of it being asserted, and AS* followed by DSACK* are negated, ending
- the local bus cycle. FCS* is quickly negated based on AS*, and the
- cycle ends.
-
-
-
- The Final Word
-
- So that's a fairly complete description of the behavior of TAS on all
- currently known implementations of the Zorro II bus. While things
- like XRDY to cycle's end and the total length of the TAS cycle aren't
- always the same between the different backplanes, this doesn't matter
- for correctly designed expansion cards.
-
- Use XRDY for asynchronous operation, OVR*, DTACK*, and C7M for
- synchronous or asynchronous operation. Any cards that find this does
- matter should avoid using TAS, since they will get into trouble sooner
- or later. The main concern to devices that want to support TAS will
- be the incomplete operation of TAS in A2500 and 2500 look-a-like
- systems. These devices must provide some software lock mechanism when
- on an A2500, which can be replaced with hardware locking on 68000
- based Amigas and A3000s, if necessary.
-
- An A3000 specific design can be a Zorro III card and use the /LOCK
- signal and the TAS, CAS, or CAS2 instructions for bus locking,
- bypassing all the 68000-inspired nonsense completely.
-